Batch to Real-Time: Incremental Data Collection & Analytics Platform
نویسندگان
چکیده
Real-time data collection and analytics is a desirable but challenging feature to provide in dataintensive software systems. To provide highly concurrent and efficient real-time analytics on streaming data at interactive speeds requires a welldesigned software architecture that makes use of a carefully selected set of software frameworks. In this paper, we report on the design and implementation of the Incremental Data Collection & Analytics Platform (IDCAP). The IDCAP provides incremental data collection and indexing in real-time of social media data; support for real-time analytics at interactive speeds; highly concurrent batch data processing supported by a novel data model; and a front-end web client that allows an analyst to manage IDCAP resources, to monitor incoming data in real-time, and to provide an interface that allows incremental queries to be performed on top of large Twitter datasets.
منابع مشابه
A SCALLA: A Platform for Scalable One-Pass Analytics using MapReduce
Today’s one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently. MapReduce is a popular programming model for processing large datasets using a cluster of machines. However, the traditional MapReduce model is not well-suited for one-pass analytics, since it is geared towards batch processing and requires the data se...
متن کاملA Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....
متن کاملCellIQ : Real-Time Cellular Network Analytics at Scale
We present CellIQ, a real-time cellular network analytics system that supports rich and sophisticated analysis tasks. CellIQ is motivated by the lack of support for realtime analytics or advanced tasks such as spatio-temporal traffic hotspots and handoff sequences with performance problems in state-of-the-art systems, and the interest in such tasks by network operators. CellIQ represents cellul...
متن کاملDesign and Test of the Real-time Text mining dashboard for Twitter
One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...
متن کاملArchitectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study
While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. However, recent studies on micro-architectural characterization of in-memory data analytics are limited to only batch processing workloads. We ...
متن کامل